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Abstract 

The time evolution of an exactly solvable layered feedforward neural 
network with three-state neurons and optimizing the mutual informa- 
tion is studied for arbitrary synaptic noise (temperature). Detailed 
stationary temperature-capacity and capacity-activity phase diagrams 
are obtained. The model exhibits pattern retrieval, pattern-fluctuation 
retrieval and spin-glass phases. It is found that there is an improved 
performance in the form of both a larger critical capacity and infor- 
mation content compared with three-state Ising-type layered network 
models. Flow diagrams reveal that saddle-point solutions associated 
with fluctuation overlaps slow down considerably the flow of the net- 
work states towards the stable fixed-points. 

1 Introduction 

By now it is common knowledge that layered feedforward models are the 
workhorses in practical applications of neural networks and, hence, progress 
in the theoretical understanding of their capabilities and limitations should 
thus be welcome. Recently, it has been shown p]|2] how information theory 
can be used to construct neural network models leading to optimal perfor- 
mance. Optimal for the task of retrieving an embedded pattern when the 



1 



network starts far from it with a vanishingly small initial mutual informa- 
tion. For two-state networks this approach recovers the well-known Hopfield 
model j3] and for three-state networks a Hamiltonian is found reminiscent 
of the Blume-Emery-Grimths (BEG) model [3] (see [3] for further references 
in a spin-glass context) with a novel Hebbian-like learning rule. 

Both the extremely diluted asymmetric version ^ H3 [7j and the fully 
connected version [21 El El HQJ of this model have already been studied. 
These studies reveal that the retrieval performance of these so-called BEG 
networks, compared with the one of other three-state networks of the same 
architecture (see [111 112] and references therein), is better in the sense that 
there is a selective increase in the information content of the network and 
that a considerably larger retrieval region exists in the phase diagrams lead- 
ing also to a sizable increase in the critical capacity. In particular, new in- 
formation carrying states, the so-called quadrupolar or pattern- fluctuation 
retrieval states appear. They play an explicit role in enhancing the retrieval 
performance of the network and they might also be important in practical 
applications. In pattern recognition, e.g., looking at a black and white pic- 
ture on a grey background, these states would describe the situations where 
the exact location of the picture with respect to the background is known 
but, the details of the picture itself are not focused. Furthermore, these 
pattern-fluctuation retrieval states might be helpful in modelling these fo- 
cusing problems discussed in the framework of cognitive neuroscience [13] . 
Consequently, a study of this BEG-model with a layered architecture is rel- 
evant. 

Moreover, the study of this system is interesting in itself as an exactly 
solvable non-trivial dynamical system. Compared with the extremely di- 
luted asymmetric architecture it contains correlations among the neurons 
because of the presence of common ancestors, although there are no feed- 
back loops Nevertheless, these correlations can be handled exactly 
giving rise to layer-to-layer evolution equations in closed form, as also seen 
in Ising-type models [T^lEi]. Since in the diluted BEG-architecture long 
transients appear due to the presence of saddle-point solutions which slow 
down considerably the dynamics of the network [Jj, it is worthwhile to find 
out whether such a dynamical behavior survives in the presence of correla- 
tions. Finally, one also likes to find out what further differences there exist 
between these BEG-architectures and their analogues in other three-state 
models. 

The outline of the paper is the following. In Section 2 we introduce 
the three-state layered BEG network model and the relevant macroscopic 
variables. In Section 3 we solve the dynamics of this model by deriving the 
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recursion relations for these variables. We discuss the results in Section 4, 
for both the stationary phase diagrams and the dynamic flow diagrams. We 
end with some concluding remarks in Section 5. 



2 The Model 

Consider a network that consists of L layers, where each layer index may 
be taken as a time step t. On each layer there are N neurons that can 
take values aj, t = 1, L; i = 1, N from the set S = {—1, 0, +1}, where 
±1 denote the active states. A macroscopic number of p = aN ternary 
patterns is taken from a set of independent identically distributed random 
variables = 0, ±1}, fj, = 1, ...,p, where ±1 are the active patterns, with 
the following probability distribution on layer t, 

Prob(^) = a5(\^\ 2 - 1) + (1 - a)6W) . (1) 

This distribution is assumed to be the same for every layer and the mean 
over it, a = (£f'*) 2 , denotes the activity of the patterns. Together with this, 
a set {r/f'*} of normalized fluctuations of the binary patterns (£f'*) 2 about 
their average is introduced, 

= m*) 2 - a)/a(l - a) . (2) 

Both, patterns and fluctuations, are embedded in the network by means of 
a generalized learning rule that consists of two Hebbian-like parts, 

fj.=l M = l 

The first part is the usual rule in a three-state layered network that codifies 
the patterns, while the second part codifies the fluctuations of the binary 
active patterns (£f'*) 2 about their average. 

Given the configuration on the first layer, erjy = = 1,...,N, the 

state of a unit <r' +1 on layer t + 1 is determined by the configuration a f N of 
the units in the previous layer according to the stochastic law 

ProbK- 1 = s E S\a%) = «*pHM«K! ^ . (4) 

In this expression, the single-site energy function for unit i on layer t+1, 
ej(s|<7^-) is given by 

€i{8\0* N ) = Sh? 1 ^) - S 2 0' + Viv) , (5) 
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where 

N N 

^ + VAr) = E44 9j +1 (* t N )=^2K t ij (a t j f (6) 

3=1 3=1 

are the local fields acting on that unit. In distinction to the usual three- 
state model where the coefficient of the quadratic part in e^slcr^) is 
an externally adjustable threshold parameter, we have here a random self- 
adjusting function {^'*}) that depends on both the states of the 
network and the patterns. 

Next, we consider the relevant quantities that describe the performance 
of the network. For both, the macroscopic order parameters and the mutual 
information, we need the conditional probability distribution Prob(<7*|£^'*) 
that a neuron i is in the state a\ on layer t given that the site i of the stored 
pattern to be retrieved is . As a consequence of the independence of the 
states of the units on a given layer, it is sufficient to consider the distribution 
for a single typical neuron, so we can omit the index i. We also omit the 
layer index t and take from previous work |17j 

Pmb(a\e) = («{ + m»^a)5(a 2 - 1) + (1 - s^)5(a), (7) 

where 

s , = s » + nef, ^ = *^ ^ = ^°. ( 8) 

1 — a 1 — a 



Here, mP" = {<y) a \^ /a is the thermodynamic limit, N — ► oo, of the retrieval 
overlap 



between the state of the network and pattern {£f }, where the brackets de- 
note the average over the probability distribution Eq.(7) and the bar denotes 
the configurational average over the patterns. The other parameters are the 
thermodynamic limits qo = (a 2 ) a \£ and = (c 2 ) (7 |^(^' i ) 2 /a, of the neural 
(dynamical) activity, respectively the activity overlap 

(*)* = ^£°?. »& = ^£<W- (io) 



Finally, = {c 2 ) ^rj^ , is the thermodynamic limit of the fluctuation overlap 
between the binary state variables a\ and t£ defined as, 

l N = ^E^- (11) 
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As is clear from its definition, the fluctuation overlap is connected with the 
activity overlap. We remark that an underlying assumption that leads to 
the BEG model and that should be preserved in the implementation for any 
network architecture is that the dynamic activity qo ~ a. The necessity of 
such an activity control system has been emphasized before (cf. |18M19| and 
references therein). 

Next, the mutual information between patterns and neurons, regarding 
the patterns as the inputs and the neuron states as the output of the network 
channel on each layer, is an architecture independent property given by 

i2oi nn 

I»(a,e) = S(a)-S(a\^), (12) 

where 

S(a) = -q In( 9o /2) - (1 - go) m(l - go) (13) 



is the entropy and S(a\^) = aS a + (1 — a)Si- a is the equivocation term 
with 

S a = -c% In 4 - (t In dL - (1 - n") ln(l - n") 
5i_ a = -s^ln(s72)-(l-^)ln(l-^). (14) 

Here, = (n^ ± m^)/2 and s 11 is the parameter in the conditional proba- 
bility Prob(<r|^). The mutual information can then be used to obtain the 
information = I^a, where a = p/N is the storage ratio of the network. 

3 Dynamics: Recurrence Relations 

To solve the dynamics and obtain the recurrence relations for the macro- 
scopic variables we need the expressions for the local fields which can be 
written as 

= \ E ^ t+1 m^ , e 1 = £ r&»H>* , (15) 

in terms of the actual overlaps 

m M.* = { a t)^t/ a t p.* = ((( 7 *)2)r ? M.* . (16) 

At this point we remark that the fluctuation overlap Z^'* can be viewed as 
the retrieval overlap between the binary states {erf} and the patterns {rft ' } 
and it is, in general, independent of the retrieval overlap m^' 1 . One would 
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expect the fluctuation overlap to become relevant for larger synaptic noise 
when the states of the network no longer distinguish between the active 
patterns. Indeed, it can be finite in a state of dynamic activity without 
necessarily a finite retrieval overlap m^' , as has been found before for both 
the extremely diluted and the fully connected network. As will be seen in 
the next Section, the fluctuation overlap is responsible for an enhancement 
of the information in most of the retrieval regime and for a finite information 
carried in the absence of retrieval. 

In Eq. (jlfij) the brackets denote thermal averages with the probability 
distribution Eq. (4). Now, {a 1 ) and ((cr t ) 2 ) are given, respectively, by 

p ftt\ - sinh(/3/Q r (h t ft u _ cosh(/?h*) 

W , ») ~ i e _^ + cosh(/W) > ^ ' U) ~ i e -W + cosh(/3^) 

(17) 

which, in the zero temperature limit, (3 — > oo, become 

F oa = sign(h)e(\h\+6) , Goo = 6(1^1+0) , (18) 

where 0(x) is the usual step function. 

We assume that a single pattern, £ >*, and fluctuation, 77 >*, are condensed 
at each layer, that is, m '* and I 1 '* are of order 0(1), and that mf l,t and Z' 1 '*, 
/i > 1, are of order 0(l/y/~N). We call the former m t and £*, respectively. 
In accordance with this, we also assume that in Eq. (8) n t = n M '* and 
s* = are both of 0(1), for fj, = 1, and we denote the information content 
of interest i = i 1 . Following ^S] each local field may then be separated into 
a signal term and a noise 

hf- 1 = ^' t+ W + zA f , 9l +1 = r/^+Y + wQ 1 , (19) 

where z and if are Gaussian random variables with zero mean and unit 
variance. The layer-dependent variances of the local fields are given by 

(A*) 2 = I J>^) 2 , (^) 2 = —1— J><*) 2 . (20) 

Together with Eqs. ()16|) - (|17|) . we have thus recurrence relations for the 
overlaps and for the variances of the local fields. 

The recurrence relations for the overlaps and the connecting equations 
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for the other dynamical variables become 

J Dz J Dw F p ' 
Dz [ Dw Gfs 



m 



t+1 



— h zA l , - + wn l 

a a 



n 



t+i 



it 



— h z/y , - + 

a a 



J+i 



Dz DwGpl zA l , j—^ + 



■wSV 



(21) 
(22) 
(23) 



where, as usual, Dx = exp(—x /2)dx / \/2ir . These expressions yield the 



fluctuation overlap l t+1 



n 



t+i 



s t+1 and the dynamic activity q$ = an 1 + 



(1 — a)s l . A further variable, q\ = (c*) 2 , is introduced in the derivation 
of the recurrence relations for the variances of the two noises |15| and it is 
given by 



q{ = j Dz j Dw aFj + zA l , l - + wQ^ 
+ (1 - a) Fl (zA* 



1 - a 

Introducing the susceptibilities with respect to the h l and the 6 t fields 
where A = 1/a, B = l/a(l — a) and 



(24) 



(25) 



Dz / Dw 



aG 



h zA l , - + wQ l 

a a 



+ (1 - a) G 2 ( zA l , 



+ wrt 



1-a 

the recurrence relations for the Gaussian noises become 

(A' +1 ) 2 = aA 2 4 + ( X *) 2 (A 4 ) 2 , (n t+1 ) 2 = aB 2 ql + (^) 2 (^) 2 



(26) 



(27) 



In contrast to these equations, the variances of the two local fields in 
the extremely diluted network do not depend on the susceptibilities but 
are simply given by the ^Q-terms pQ. Since one expects somewhat different 
behavior for the two architectures, it may be interesting to see how the 
properties of the layered network model changeover to those of the extremely 
diluted network. This can be achieved most easily by means of a single 
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amplitude D, varying between 1 and 0, in front of both second terms in the 
variance of the noises, as we will discuss at the end of the next Section. 

With the above equations we also get the time evolution of the infor- 
mation i by means of Eqs. 1)12(1 to (|14|). The recurrence relations for the 
macroscopic order parameters and the connecting equations can now be 
used to study the evolution of the network and to determine the proper- 
ties of the stable stationary states. The stationary states are reached when 
m = m t+l = m t , n = n t+1 = n 1 and s = s t+1 = s t . Then I = l t+1 = l l and 
also ^q, q\ and p\ reach stationary values. 

4 Thermodynamics and flow diagrams 

In this section we study both the stationary solutions and the flow diagrams 
for the layered BEG-model. Concerning the stable stationary states of the 
network we find three kinds of phases. We have one or more retrieval phases 
R{m > 0,1 > 0,qi > 0), one or more pattern-fluctuation retrieval phases 
Q(m = 0, 1 > 0, qi > 0), and a spin-glass phase SG(m = 0, 1 = 0, q\ > 0). 
All three are sustained activity solutions in the sense that qo > 0. 

The existence of the Q phases can be understood as follows. A non-zero / 
will appear when the active binary neuron states a 2 , that do not distinguish 
between ±1 neurons, coincide with the active patterns. At the same time 
the actual ±1 active neuron states may fail to recognize the active patterns, 
meaning m = 0. This is expected to occur at high T where a stable Q phase 
should appear. There is a finite q\ for either form of the active neuron states. 
The presence of a stable Q phase only at high T has been checked already 
for both the extremely diluted [7] and the fully connected network In 
particular, it appears in the phase diagram for a = 0, which is independent 
of the architecture. 

Since for the extremely diluted network it is found that the Q phase is 
not stable at zero-temperature, but is instead a saddle-point |7j even for 
non-zero a, we consider first in Fig. 1 the capacity-activity phase diagram 
at T = for the layered network. 

There is a stable retrieval phase below the heavy solid line and a second 
retrieval phase with a smaller overlap appears in the lower shaded triangu- 
lar regions. The pattern-fluctuation retrieval states are only saddle-point 
solutions below the light solid line. There is also everywhere a stable spin- 
glass solution and all the lines denote discontinuous transitions. For com- 
parison, the heavy dashed line shows the retrieval phase boundary for the 
optimal three-state Ising layered network, optimal in the sense that the ad- 
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Figure 1: The capacity-activity (a — a) phase diagram for the BEG and 
Q = 3-Ising networks at T = 0. The meaning of the regions and of the lines 
is explained in the text. 

justable threshold parameter 9 was chosen to optimize the storage capacity 
a. Clearly, for intermediate activity a 6 (0.435, 0.727) the BEG network has 
a larger critical storage capacity than the Ising network |16j . 

Another performance measure is the information content i of the net- 
work and in Figs. 2 we show the information-capacity diagrams for various 
activities, at T = 0, for the BEG and Ising networks. Again, the BEG 
performance is better for intermediate activity and the information content 
is purely due, at this temperature, to the only stable retrieval phase. At 
larger activity, a = 0.8 say, the BEG and Ising networks compete for better 
performance at intermediate or larger a values, as seen in Fig. 2c. 

In order to highlight the role of the temperature and the activity in the 
appearance of a stable Q phase, we consider the temperature dependence 
for given activities and show in Fig. 3a the temperature-capacity diagram 
for a = 0.4 and in Fig. 3b for a = 0.8. In the first one there is only a stable 
retrieval phase (a second one in the shaded area) below the heavy phase 
boundary, in addition to the spin-glass phase. There are no Q states, even 
not saddle-point solutions, in this region. In the case of a = 0.8, instead, 
there is a single stable retrieval phase below the solid (dotted) heavy lines 
which denote discontinuous (continuous) transitions, respectively, with a 
tricritical point at T ~ 0.753 and a ~ 0.018. There is also a stable (unstable 
saddle-point) Q phase above (below) the heavy lines, as indicated in the 
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Figure 2: The mutual information i as a function of the capacity a for the 
BEG (solid line) and the Q=3-Ising (dotted line) networks for T = and 
a = 0.4 (a) a = 0.6 (b) and a = 0.8 (c). 
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Figure 3: The temperature-capacity (T, a) phase diagram for the BEG net- 
work for a = 0.4 (a) and a = 0.8 (b). The meaning of the regions and lines 
are explained in the text. 

figure, and the Q phase solutions end at discontinuous transitions shown as 
solid light lines. There is also everywhere a stable spin- glass solution which 
only disappears as T —* oo. It is clear from Fig. 3b that a stable Q phase will 
only appear for sufficiently large synaptic noise and large activity, a feature 
also found for the extremely diluted [Jj and the fully connected network [2] . 

We consider now the role of the activity and show the capacity-activity 
phase diagrams in Fig. 4a, for T = 0.4, and in Fig. 4b, for T = 0.8. 
In the case of the former there is a stable retrieval phase below the heavy 
solid line and a second stable retrieval phase in the shaded region. Again, 
there is a considerably enhanced storage capacity for retrieval, a ~ 0.119, 
for an intermediate activity of a ~ 0.676. The pattern-fluctuation retrieval 
states are only saddle-point solutions and this is the case below the light 
solid line. When T = 0.8 there is a stable retrieval phase below the heavy 
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Figure 4: The capacity-activity (a, a) phase diagram for the BEG network 
at T = 0.4 (a) and T = 0.8 (b). The lines are explained in the text. 



dotted (continuous transition) and heavy solid (discontinuous transition) 
lines which merge at a tricritical point at a ~ 0.839 and a ~ 0.0036. The 
pattern-fluctuation retrieval solution is a stable phase below the light solid 
line and only a saddle point below the heavy and light dotted lines. There 
is now a finite storage capacity, a « 0.022, at a « 0.8 for the retrieval of 
active patterns as a pattern-fluctuation retrieval phase. 

Thus, as T increases, the useful performance of the network goes over 
from the retrieval to the pattern-fluctuation retrieval phase. Since the in- 
formation content is a common performance measure for both phases, we 
show its temperature dependence in an information-capacity phase diagram 
in Fig. 5, for a = 0.8 and various T, where the solid (dotted) lines represent 
information due to a stable retrieval (pattern-fluctuation retrieval) phase. 

In all the above situations the SG solutions appear as a stable phase, and 
we consider now the flow diagrams in the (/, m) order-parameter space. We 
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Figure 5: The information i as a function of the capacity a for several values 
of T and a = 0.8. The solid (dashed) line correspond to the R (Q) states. 

show the results for T = 0.8, activity a = 0.8 for a = 0.005 in the presence of 
an R phase in Fig. 6a, and for a = 0.01 in the presence of a Q phase in Fig. 
6b. These correspond to states on either side of the dotted phase boundary 
in Fig. 3b. In the first one there is a stable retrieval solution and there are 
two Q saddle points (open circles) and in the second one there is a stable 
and an unstable Q solution. In both cases there is a stable SG solution 
which can be accessed only below the lower Q saddle point. The chains of 
dots actually indicate time steps and, as can be seen from these figures, the 
flows to the stable solutions are considerably delayed by the saddle points 
in the form of slow transients of the dynamics. A remarkable feature of the 
flow diagrams is the presence of quite large basins of attraction either to the 
stable R state or to the stable Q state, even for the fairly high T (and small 
a) for this case. Also, not surprisingly, one finds a much smaller basin of 
attraction to the SG states. Similar features have also been found in the 
dynamics of the extremely diluted network except for the SG states, which 
are absent in that case (Zj. 

We turn now to a brief study of the way in which the phase diagrams for 
the layered BEG network turn into those for the extremely diluted network 
by means of a variable amplitude D in the second terms in Eq. (|27|) . i.e., 

(A< +1 ) 2 = aA 2 q t + D( X t ) 2 {A t ) 2 , (ft m ) 2 = aB 2 q t + D(V*) 2 (0*) 2 . (28) 

When D = we have the extremely diluted architecture, D = 1 corresponds 
to the layered one. To be specific, consider the phase diagram for the layered 
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Figure 6: Two dimensional retrieval fluctuation overlap / — m flow diagrams 
for the BEG network for T = 0.8, a = 0.8 and a = 0.005 (a) and a = 0.01 
(b). Open circles are saddle points, closed ones are attractors. 
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network at T = 0.8 shown in Fig. 4b where the phase boundary for existence 
of a stable Q phase ends at a = when the activity a = 1. As D decreases 
from unity, this part of the phase boundary moves up and the critical a 
gradually increases at a = 1. For D = 0.9 it becomes already a « 0.034. 
At the same time the maximum activity a for the lower retrieval phase 
boundary to a stable R phase starts to increase towards larger values. The 
phase boundary itself continues to end at a = 0. For D = 0.5, say, the 
maximum a for retrieval is still less than unity and the SG phase becomes 
now restricted to a region on both sides of the left phase boundary in the 
a vs. a phase diagram. Ultimately, when the extremely diluted limit is 
reached, the stable R phase goes up to a = 1, still with a = 0, with a fairly 
large Q phase and the SG phase becomes a self-sustained activity phase S, 
with m = 0, I = and q\ = 0. In part of the phase diagram the S and 
Q phases coexist, in accordance with our earlier results on the extremely 
diluted network 0. 

5 Concluding remarks 

We have derived the recursion relations that describe the time evolution of 
the macroscopic variables for an exactly solvable three-state network on a 
feed-forward layered architecture, optimizing the mutual information. This 
so-called layered Blume-Emery-Griffiths (BEG) network shows distinct sta- 
tionary phase diagrams from either its extremely diluted or fully connected 
versions studied in the literature. Being a truly dynamical system, there is 
no phase boundary of local stability in the layered network between either 
the retrieval, R, or the pattern-fluctuation retrieval phase, Q, and the spin- 
glass phase, SG, in contrast to the behavior in the fully connected network. 
But this does not mean that within the retrieval regime the network cannot 
be trapped in SG states, as it is clear from the flow diagrams. This makes 
the layered network different from the extremely diluted network in which 
the R or Q states are the only stable states over most of the regions where 
these phases exist. 

We have found that, in common with both the extremely diluted and the 
fully connected network, a stable pattern-fluctuation retrieval phase appears 
only at high T and for intermediate-to-large, but not full, activity a. At low 
T, in particular at T = 0, this phase is not stable but is instead a saddle- 
point solution. Nevertheless, the BEG layered network has, selectively, a 
quite better performance than the three-state Ising layered network, not 
only as far as the retrieval capacity is concerned, as in the case of both 
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the fully connected and the extremely diluted network, but it also yields a 
considerably larger information content. This additional information is due 
to the enhancement by the pattern-fluctuation states of the stable retrieval 
phase at low and intermediate T. The pattern-fluctuation retrieval phase, 
instead, is responsible for a much smaller but non-negligible information 
content. 

To summarize, the BEG network on a feed-forward layered structure is 
not only an interesting dynamical system in itself but it also performs better 
than other, for instance, Ising layered networks within relevant regimes of 
temperature and activity. 
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